Fourier Policy Gradients

نویسندگان

  • Matthew Fellows
  • Kamil Ciosek
  • Shimon Whiteson
چکیده

We propose a new way of deriving policy gradient updates for reinforcement learning. Our technique, based on Fourier analysis, recasts integrals that arise with expected policy gradients as convolutions and turns them into multiplications. The obtained analytical solutions allow us to capture the low variance benefits of EPG in a broad range of settings. For the critic, we treat trigonometric and radial basis functions, two function families with the universal approximation property. The choice of policy can be almost arbitrary, including mixtures or hybrid continuous-discrete probability distributions. Moreover, we derive a general family of sample-based estimators for stochastic policy gradients, which unifies existing results on sample-based approximation. We believe that this technique has the potential to shape the next generation of policy gradient approaches, powered by analytical results.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimizing Schrödinger Functionals Using Sobolev Gradients: Applications to Quantum Mechanics and Nonlinear Optics

In this paper we study the application of the Sobolev gradients technique to the problem of minimizing several Schrödinger functionals related to timely and difficult nonlinear problems in Quantum Mechanics and Nonlinear Optics. We show that these gradients act as preconditioners over traditional choices of descent directions in minimization methods and show a computationally inexpensive way to...

متن کامل

Revisiting stochastic off-policy action-value gradients

Off-policy stochastic actor-critic methods rely on approximating the stochastic policy gradient in order to derive an optimal policy. One may also derive the optimal policy by approximating the action-value gradient. The use of action-value gradients is desirable as policy improvement occurs along the direction of steepest ascent. This has been studied extensively within the context of natural ...

متن کامل

Policy gradients in linearly-solvable MDPs

We present policy gradient results within the framework of linearly-solvable MDPs. For the first time, compatible function approximators and natural policy gradients are obtained by estimating the cost-to-go function, rather than the (much larger) state-action advantage function as is necessary in traditional MDPs. We also develop the first compatible function approximators and natural policy g...

متن کامل

Appendix A: Wavefront Reconstruction Based on Fourier Series

Fourier Wavefront Reconstruction Over a Rectangular Pupil The iterative and boundary methods developed here for circular and elliptical pupils are based on a method to reconstruct the wavefront over the rectangular pupil. To fully understand this development, it is necessary to first summarize the approach used by Freischlad and Koliopoulos to derive an inverse spatial filter to reconstruct a w...

متن کامل

Adaptive Step-size Policy Gradients with Average Reward Metric

In this paper, we propose a novel adaptive step-size approach for policy gradient reinforcement learning. A new metric is defined for policy gradients that measures the effect of changes on average reward with respect to the policy parameters. Since the metric directly measures the effects on the average reward, the resulting policy gradient learning employs an adaptive step-size strategy that ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1802.06891  شماره 

صفحات  -

تاریخ انتشار 2018